Improving accuracy of students' final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique
نویسندگان
چکیده
There is a perpetual elevation in demand for higher education in the last decade all over the world; therefore, the need for improving the education system is imminent. Educational data mining is a newly-visible area in the field of data mining and it can be applied to better understanding the educational systems in Bangladesh. In this research, we present how data can be preprocessed using a discretization method called the Optimal Equal Width Binning and an over-sampling technique known as the Synthetic Minority Over-Sampling (SMOTE) to improve the accuracy of the students’ final grade prediction model for a particular course. In order to validate our method we have used data from a course offered at North South University, Bangladesh. The result obtained from the experiment gives a clear indication that the accuracy of the prediction model improves significantly when the discretization and over-sampling methods are applied.
منابع مشابه
ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION
With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...
متن کاملA critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition
Predicting student attrition is an intriguing yet challenging problem for any academic institution. Classimbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the...
متن کاملBlending Propensity Score Matching and Synthetic Minority Over-sampling Technique for Imbalanced Classification
Real world data sets often contain disproportionate sample sizes of observed groups making the task of prediction algorithms very difficult. One of the many ways to combat inherit bias from class imbalance data is to perform re-sampling. In this paper we discuss two popular re-sampling approaches proposed in literature, Synthetic Minority Over-sampling Technique (SMOTE) and Propensity Score Mat...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Decision Analytics
دوره 2 شماره
صفحات -
تاریخ انتشار 2015